Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffArkiver Date: Jan 7, 2014 10:42pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Well, you can use the robots.txt to tell crawlers to not crawl your website, but it won't work if the crawlers are "bad". The google crawler, IA crawler and many other crawler stick to the rules of the robots.txt, but crawler can also just not follow the rules from the robots.txt file and do whatever they want. So you can exclude bad crawlers, but it wouldn't help a lot...

I don't think I can help you a lot now with the pages and the search results in google from your website. I think you should take a look at some Website SEO (Search Engine Optimalization) articles on how to make your site better searchable in google and other search engines.

It sounds horrible what happened to you with the car accident... There are some really horrible and disgusting people on this planet... :(

Reply to this post
Reply [edit]

Poster: Medworks Date: Jan 8, 2014 1:39pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Well, yes, obviously it would be completely outragious for me to expect you to help me redesign my website(s), I'm astonished you did as much as you did.

Just a little FYI though, I just discovered that in fact there IS a reason to have a crawler delay in there. My website was SHUT DOWN and suspended by my webhost 3 hours ago because they got 100000 requests from a bot in the netherlands. If as you say the "bad bots" don't obey the directive to delay the time in the text file then this was completely coincidental that it happened right when I removed the delay from robots.txt, but more likely than not, it was a consequence of it, which means it was a "good bot", i.e. one I don't want to ban, but it just did it so fast that it angered the webhost. The inmotionhosting representative actually said the opposite of what you did, he actually said they generally recommended a delay of 30 seconds. But I got him to put in a delay of 1 second before putting the site live again and removing the suspension. So it's apparently bad to remove it entirely. I can only guess that the environment of the internet is different now than it was at the beginning of 2011 because as you noted, there was a whole 3 year period of time there was no robots.txt file at all, yet this thing where the single IP address slams the website with requests because there's no time delay in a robots.txt file never happened in all that time, yet it happened essentially as soon as I removed the delay line from robots.txt 2 days ago here in 2014, so you might consider that it's good to have a delay after all, just not 30 seconds.

Well thanks for all your help. The IP address in netherlands wasn't anything associated with alexa or the wayback machine, was it? You said you were doing somethingorother that would count in the hundreds of thousands with my site. The problem was just that it was too much, too fast.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffArkiver Date: Jan 8, 2014 10:47pm
Forum: faqs Subject: Re: the captures of my site are so sparse

..... oh...
Gosh, I think I actually am that ip adress in the Netherlands... :/
Well, I didn't expect that to happen, I am very very very sorry!! :(

This post was modified by Arkiver on 2014-01-09 06:47:55

Terms of Use (31 Dec 2014)